AITopics

2605.06479

Country: North America > United States > Pennsylvania (0.40)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.49)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

arXiv.org Machine LearningApr-21-2026

Conformal Risk Control under Non-Monotone Losses: Theory and Finite-Sample Guarantees

Aldirawi, Tareq, Li, Yun, Guo, Wenge

Conformal risk control (CRC) provides distribution-free guarantees for controlling the expected loss at a user-specified level. Existing theory typically assumes that the loss decreases monotonically with a tuning parameter that governs the size of the prediction set. However, this assumption is often violated in practice, where losses may behave non-monotonically due to competing objectives such as coverage and efficiency. In this paper, we study CRC under non-monotone loss functions when the tuning parameter is selected from a finite grid, a setting commonly arising in thresholding and discretized decision rules. Revisiting a known counterexample, we show that the validity of CRC without monotonicity depends critically on the relationship between the calibration sample size and the grid resolution. In particular, reliable risk control can still be achieved when the calibration sample is sufficiently large relative to the grid size. We establish a finite-sample guarantee for bounded losses over a grid of size $m$, showing that the excess risk above the target level $α$ scales on the order of $\sqrt{\log(m)/n}$, where $n$ is the calibration sample size. A matching lower bound demonstrates that this rate is minimax optimal. We also derive refined guarantees under additional structural conditions, including Lipschitz continuity and monotonicity, and extend the analysis to settings with distribution shift via importance weighting. Numerical experiments on synthetic multilabel classification and real object detection data illustrate the practical implications of non-monotonicity. Methods that explicitly account for finite-sample uncertainty achieve more stable risk control than approaches based on monotonicity transformations, while maintaining competitive prediction set sizes.

artificial intelligence, machine learning, risk control, (19 more...)

2604.01502

Country:

North America > United States > New Jersey > Essex County > Newark (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningMar-27-2026

Conformal Selective Prediction with General Risk Control

Bai, Tian, Jin, Ying

In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.

large language model, machine learning, natural language, (17 more...)

2603.24704

Country:

North America > United States > Pennsylvania (0.40)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (0.92)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Neural Information Processing SystemsMar-22-2026, 19:10:31 GMT

Fast yet Safe: Early-Exiting with Risk Control

Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it'safe' for an EENN to go'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.

artificial intelligence, machine learning, proceedings, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-18-2026, 13:57:23 GMT

ea5a63f7ddb82e58623693fd1f4933f7-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Neural Information Processing SystemsFeb-15-2026, 17:18:42 GMT

6eb05d8bc6bd7bb6868c64b5802125bd-Paper-Conference.pdf

experiment, machine learning, natural language, (20 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Vision (0.68)
(3 more...)

Neural Information Processing SystemsFeb-8-2026, 00:02:32 GMT

LocalizedAdaptiveRiskControl

The theoretical results highlight atrade-offbetween localization ofthe statistical risk and convergence speed to the long-term risk target.

artificial intelligence, justification, machine learning, (19 more...)

Country:

Oceania > Australia > New South Wales (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)

Hultberg, Bror, Zachariah, Dave, Ribeiro, Antônio H.

Anytime-Valid Conformal Risk Control

arXiv.org Machine LearningFeb-5-2026

Prediction sets provide a means of quantifying the uncertainty in predictive tasks. Using held out calibration data, conformal prediction and risk control can produce prediction sets that exhibit statistically valid error control in a computationally efficient manner. However, in the standard formulations, the error is only controlled on average over many possible calibration datasets of fixed size. In this paper, we extend the control to remain valid with high probability over a cumulatively growing calibration dataset at any time point. We derive such guarantees using quantile-based arguments and illustrate the applicability of the proposed framework to settings involving distribution shift. We further establish a matching lower bound and show that our guarantees are asymptotically tight. Finally, we demonstrate the practical performance of our methods through both simulations and real-world numerical examples.

artificial intelligence, machine learning, prediction, (17 more...)

2602.04364

Country:

North America > United States (0.14)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJan-1-2026

MultiRisk: Multiple Risk Control via Iterative Score Thresholding

Joshi, Sunay, Sun, Yan, Hassani, Hamed, Dobriban, Edgar

As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a lightweight mechanism for behavior control that compares performance scores to estimated thresholds, and modifies outputs when these bounds are violated. We formalize the problem of enforcing multiple risk constraints with user-defined priorities, and introduce two efficient dynamic programming algorithms that leverage this sequential structure. The first, MULTIRISK-BASE, provides a direct finite-sample procedure for selecting thresholds, while the second, MULTIRISK, leverages data exchangeability to guarantee simultaneous control of the risks. Under mild assumptions, we show that MULTIRISK achieves nearly tight control of all constraint risks. The analysis requires an intricate iterative argument, upper bounding the risks by introducing several forms of intermediate symmetrized risk functions, and carefully lower bounding the risks by recursively counting jumps in symmetrized risk functions between appropriate risk levels. We evaluate our framework on a three-constraint Large Language Model alignment task using the PKU-SafeRLHF dataset, where the goal is to maximize helpfulness subject to multiple safety constraints, and where scores are generated by a Large Language Model judge and a perplexity filter. Our experimental results show that our algorithm can control each individual risk at close to the target level.

large language model, machine learning, natural language, (22 more...)

2512.24587

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

arXiv.org Artificial IntelligenceOct-30-2025

LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies

Sun, Ximan, Cheng, Xiang

Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatism. We standardize states and actions consistently at train and test time and report a state-conditional out-of-distribution (OOD) metric alongside return. On D4RL MuJoCo tasks, LRT-Diffusion improves the return-OOD trade-off over strong Q-guided baselines in our implementation while honoring the desired alpha. Theoretically, we establish level-alpha calibration, concise stability bounds, and a return comparison showing when LRT surpasses Q-guidance-especially when off-support errors dominate. Overall, LRT-Diffusion is a drop-in, inference-time method that adds principled, calibrated risk control to diffusion policies for offline RL.

lrt, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.24983

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)